Search CORE

58 research outputs found

Reconstructing intelligible audio speech from visual speech features

Author: Le Cornu Thomas
Milner Ben
Publication venue
Publication date: 01/01/2015
Field of study

This work describes an investigation into the feasibility of producing intelligible audio speech from only visual speech fea- tures. The proposed method aims to estimate a spectral enve- lope from visual features which is then combined with an arti- ficial excitation signal and used within a model of speech pro- duction to reconstruct an audio signal. Different combinations of audio and visual features are considered, along with both a statistical method of estimation and a deep neural network. The intelligibility of the reconstructed audio speech is measured by human listeners, and then compared to the intelligibility of the video signal only and when combined with the reconstructed audio

University of East Anglia digital repository

Voicing classification of visual speech using convolutional neural networks

Author: Le Cornu Thomas
Milner Ben
Publication venue
Publication date: 01/01/2015
Field of study

The application of neural network and convolutional neural net- work (CNN) architectures is explored for the tasks of voicing classification (classifying frames as being either non-speech, unvoiced, or voiced) and voice activity detection (VAD) of vi- sual speech. Experiments are conducted for both speaker de- pendent and speaker independent scenarios. A Gaussian mixture model (GMM) baseline system is de- veloped using standard image-based two-dimensional discrete cosine transform (2D-DCT) visual speech features, achieving speaker dependent accuracies of 79% and 94%, for voicing classification and VAD respectively. Additionally, a single- layer neural network system trained using the same visual fea- tures achieves accuracies of 86 % and 97 %. A novel technique using convolutional neural networks for visual speech feature extraction and classification is presented. The voicing classifi- cation and VAD results using the system are further improved to 88 % and 98 % respectively. The speaker independent results show the neural network system to outperform both the GMM and CNN systems, achiev- ing accuracies of 63 % for voicing classification, and 79 % for voice activity detection

University of East Anglia digital repository

Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation

Author: Le Cornu Thomas
Milner Ben
Websdale Danny
Publication venue
Publication date: 01/01/2015
Field of study

A study is presented on how well objective measures of speech quality and intelligibility can predict the subjective in- telligibility of speech that has undergone spectral envelope smoothing and simplification of its excitation. Speech modi- fications are made by resynthesising speech that has been spec- trally smoothed. Objective measures are applied to the mod- ified speech and include measures of speech quality, signal- to-noise ratio and intelligibility, as well as proposing the nor- malised frequency-weighted spectral distortion (NFD) measure. The measures are compared to subjective intelligibility scores where it is found that several have high correlation (|r| ≥ 0.7), with NFD achieving the highest correlation (r = −0.81

University of East Anglia digital repository

Generating intelligible audio speech from visual speech

Author: Le Cornu Thomas
Milner Ben P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/06/2017
Field of study

This work is concerned with generating intelligible audio speech from a video of a person talking. Regression and classification methods are proposed first to estimate static spectral envelope features from active appearance model (AAM) visual features. Two further methods are then developed to incorporate temporal information into the prediction - a feature-level method using multiple frames and a model-level method based on recurrent neural networks. Speech excitation information is not available from the visual signal, so methods to artificially generate aperiodicity and fundamental frequency are developed. These are combined within the STRAIGHT vocoder to produce a speech signal. The various systems are optimised through objective tests before applying subjective intelligibility tests that determine a word accuracy of 85% from a set of human listeners on the GRID audio-visual speech database. This compares favourably with a previous regression-based system that serves as a baseline which achieved a word accuracy of 33%

Crossref

University of East Anglia digital repository

Reconstruction of intelligible audio speech from visual speech information

Author: Le Cornu Thomas
Publication venue
Publication date: 01/11/2016
Field of study

The aim of the work conducted in this thesis is to reconstruct audio speech signals using information which can be extracted solely from a visual stream of a speaker's face, with application for surveillance scenarios and silent speech interfaces. Visual speech is limited to that which can be seen of the mouth, lips, teeth, and tongue, where the visual articulators convey considerably less information than in the audio domain, leading to the task being difficult. Accordingly, the emphasis is on the reconstruction of intelligible speech, with less regard given to quality. A speech production model is used to reconstruct audio speech, where methods are presented in this work for generating or estimating the necessary parameters for the model. Three approaches are explored for producing spectral-envelope estimates from visual features as this parameter provides the greatest contribution to speech intelligibility. The first approach uses regression to perform the visual-to-audio mapping, and then two further approaches are explored using vector quantisation techniques and classification models, with long-range temporal information incorporated at the feature and model-level. Excitation information, namely fundamental frequency and aperiodicity, is generated using artificial methods and joint-feature clustering approaches. Evaluations are first performed using mean squared error analyses and objective measures of speech intelligibility to refine the various system configurations, and then subjective listening tests are conducted to determine word-level accuracy, giving real intelligibility scores, of reconstructed speech. The best performing visual-to-audio domain mapping approach, using a clustering-and-classification framework with feature-level temporal encoding, is able to achieve audio-only intelligibility scores of 77 %, and audiovisual intelligibility scores of 84 %, on the GRID dataset. Furthermore, the methods are applied to a larger and more continuous dataset, with less favourable results, but with the belief that extensions to the work presented will yield a further increase in intelligibility

University of East Anglia digital repository

Using Visual Speech Information in Masking Methods for Audio Speaker Separation

Author: Khan Faheem
Le Cornu Thomas
Milner Ben P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2018
Field of study

This work examines whether visual speech infor- mation can be effective within audio masking-based speaker separation to improve the quality and intelligibility of the target speech. Two visual-only methods of generating an audio mask for speaker separation are first developed. These use a deep neural network to map visual speech features to an audio feature space from which both visually-derived binary masks and visually- derived ratio masks are estimated, before application to the speech mixture. Secondly, an audio ratio masking method forms a baseline approach for speaker separation which is extended to exploit visual speech information to form audio-visual ratio masks. Speech quality and intelligibility tests are carried out on the visual-only, audio-only and audio-visual masking methods of speaker separation at mixing levels from -10dB to +10dB. These reveal substantial improvements in the target speech when applying the visual-only and audio-only masks, but with highest performance occurring when combining audio and visual information to create the audio-visual masks

University of East Anglia digital repository

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Author: Beerends John G
Chung Joon Son
Cornu Thomas Le
Lan Yuxuan
Lee Daehyun
Ngiam Jiquan
Pachoud Samuel
Summerfield Quentin
Thiede Thilo
Zimmermann Marina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2018
Field of study

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

arXiv.org e-Print Archive

Crossref

SeedGerm: a cost‐effective phenotyping platform for automated seed imaging and machine‐learning based phenotypic analysis of crop seed germination

Author: Ball Joshua
Benjamins Rene
Bostrom Aaron
Colmer Joshua
Flores Andaluz Gema
Le Cornu Thomas
Lou Qiaojun
Lu Wei
O'Neill Carmel M.
Penfield Steven
Renema Jim
Reynolds Daniel
Shiralagi Gagan
Websdale Danny
Wells Rachel
Zhou Ji
Publication venue: 'Wiley'
Publication date: 01/10/2020
Field of study

Efficient seed germination and establishment are important traits for field and glasshouse crops. Large-scale germination experiments are laborious and prone to observer errors, leading to the necessity for automated methods. We experimented with five crop species, including tomato, pepper, Brassica, barley, and maize, and concluded an approach for large-scale germination scoring. Here, we present the SeedGerm system, which combines cost-effective hardware and open-source software for seed germination experiments, automated seed imaging, and machine-learning based phenotypic analysis. The software can process multiple image series simultaneously and produce reliable analysis of germination- and establishment-related traits, in both comma-separated values (CSV) and processed images (PNG) formats. In this article, we describe the hardware and software design in detail. We also demonstrate that SeedGerm could match specialists’ scoring of radicle emergence. Germination curves were produced based on seed-level germination timing and rates rather than a fitted curve. In particular, by scoring germination across a diverse panel of Brassica napus varieties, SeedGerm implicates a gene important in abscisic acid (ABA) signalling in seeds. We compared SeedGerm with existing methods and concluded that it could have wide utilities in large-scale seed phenotyping and testing, for both research and routine seed technology applications

Crossref

University of East Anglia digital repository

Networking and the development of professionals: Beginning teachers building social capital

Author: Adler
Alison R.C. Fox
Alsup
Ambrosetti
Baker-Doyle
Baker-Doyle
Beijaard
Blase
Brehm
British Educational Research Association (BERA)
Brody
Cardini
Carmichael
Carmichael
Castro
Chattopadhay
Chong
Clement
Coldron
Coleman
Cooper
Cox
Day
Day
Day
Deal
Dunlop
Elaine G. Wilson
Ewing
Flores
Fox
Fox
Fox
Geertz
Granovetter
Hakkarainen
Hallam
Hargreaves
Hargreaves
Hargreaves
Hargreaves
Harscher
Heffernan
Hoekstra
Hunt
Johnson
Johnson
Johnson
Kelchtermans
Knoke
Le Cornu
Le Cornu
Lieberman
Lin
Maier
Makitalo-Siegl
Mansfield
McCallum
McCormick
McGregor
Mertz
Morgan
Nardi
Nasir
Nguyen
Nias
Organisation for Economic Co-Operation and Development
Organisation for Economic Co-operation and Development
Owen
Pearce
Pedder
Peters
Ponterotto
Portes
Portes
Priyadharshini
Putnam
Qualman
Rippon
Robson
Sako
Schuller
Sergiovanni
Stoll
Strauss
Stutchbury
Szreter
Tait
Thomas
Tomas
Troman
Villegas-Reimers
Walford
Wenger
Whitaker
Williams
Wilson
Wilson
Wu
Publication venue: 'Elsevier BV'
Publication date: 14/01/2015
Field of study

Beginning teachers need support when starting their teaching career. Networking can help teachers develop social capital which supports their development as professionals on their career journey. This paper presents case studies of three secondary school trainee teachers during an English year-long initial teacher education programme. The relationships supporting trainees were characterised differently: for practice development as opposed to enhancing a sense of belonging to the profession. Whether supportive relationships developed depended not only on the trainees but also on others. The paper encourages those involved in teacher education to promote social-capital building as supportive of the development of teachers as professionals

Crossref

Open Research Online (The Open University)

Leicester Research Archive

Abstracts from the twenty-third meeting of the pancreatic society of Great Britain and Ireland at the Village Hotel, Leeds, UK

Author: Ainley CC
Alderson D.
Amin Z.
Ammann R.
Ammori BJ
Ammori BJ
Andrén-Sandberg A.
Barclay GR
Becker KL
Bell JRG
Berry DP
Bimmler Daniel
Bramhall SR
Buckets JAC
Burden AC
Campbell ,
Charnley R.
Chillistone D.
Clayton HA
Creighton J.
Curtis A.
Davides D.
Davies JE
Dawiskiba S.
Demaine A.
Dennison AR
Dennison AR
Ellis I.
Evans DF
Evans JD
Fearon K.
Frick Thomas
Ghaneh P.
Ghaneh P.
Graf Rolf
Greenhalf W.
Gunson BK
Gur U.
Hales CN
Healy J.
Heitz Ph
Hill G.
Howes N.
Humphreys M.
Johnson C.
Johnson C.
Kawesha A.
King LJ
Kingsnorth A.
Kite P.
Klöppel G.
Larvin M.
Larvin M.
Le Cornu KJ
Lemoine NR
Lemoine NR
London NJM
Lyall R.
Manu M.
Martin IG
Mayer AD
McMahon KJ
McMahon MJ
McMaster P.
McRonald F.
Mirza DF
Mountford R.
Murchison J.
Neoptolemos JP
Neoptolemos JP
Neoptolemos JP
Neoptolemos JP
Norton SA
Nylen ES
Pearce NW
Petersen OH
Piotrowicz AJK
Pollard C.
Powell JJ
Predolac D.
Raraty MGT
Rees Y.
Ross J.
Rutherford S.
Sargen K.
Siriwardena AK
Smithies A.
Snider RH
Stoner EA
Storey S.
Sutton CD
Sutton R.
Swift SM
Thompson J.
Vaillant C.
Vezakis A.
Ward JB
Whitcomb D.
White JC
White S.
White SA
Wilson D.
Wotherspoon A.
Publication venue
Publication date: 18/06/2018
Field of study

RERO DOC Digital Library